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ABSTRACT 

Since 1994, the American Psychological Association (APA) has 
advocated the inclusion of effect size indices in reporting research to 
elucidate the statistical significance of studies based on sample size. In 
2001, the fifth edition of the APA Publication Manual" stressed the 
importance of including an index of effect size to clarify research reports. 
Many journal now require authors to include effect-size statistics, but there 
is little guidance for researchers to indicate the effect size index they 
should use for univariate and multivariate research designs. This paper 
reviews the methods suggested for reporting effect size, noting similarities 
and differences among them. A table summarizes the formulas for 16 common 
effect size statistics. (Contains 20 references.) (Author/SLD) 
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ABSTRACT 

Since 1994, the American Psychological Association has advocated the inclusion 
of effect-size reporting in research to elucidate the statistical significance of studies based 
on sample size. In 1999, the APA Task Force on Statistical Inference emphasized that 
effect sizes should always be reported along with p values. In 2001, the 5th edition of the 
APA Publication Manual stressed the importance of including an index of effect size to 
clarify how much difference exists. As a result, many research journals require authors to 
include effect-size statistics. While researchers will comply and follow editorial 
leadership in this regard, there is little guidance for investigators in which statistics they 
should use to report effect size for univariate and multivariate research designs. This 
paper is an attempt to review the variety of methods suggested for reporting effect size. 
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Interpreting and Reporting Effect Sizes in Research Investigations 

Introduction 

Statistical significance tests have been essential for social science and educational 
research for the past 70 years, but they have been criticized for dependency on sample 
size. In an effort to correct for this limitation, the 1994 edition of the American 
Psychological Association (APA) publication manual encouraged the use of effect-size 
reporting and many journals now require it. In 1 999, the APA Task Force on Statistical 
Inference emphasized that effect sizes should always be reported along with p values. 
Subsequently, in 2001, the 5th edition of the APA Publication Manual stressed the 
importance of including an index of effect size. Although the concept of effect size has 
existed for many years, it remains perplexing to investigators and reports of effect sizes 
remains infrequent. Effect size is the difference between the null and alternative 
hypotheses, and can be measured either using raw or standardized values. At issue is the 
probability of getting a statistically significant result if there is a real effect in the 
population under examination If a test is not significant, it is important to know if this is 
because there is no effect or because the research design did not detect it. 

Recent research of the literature has revealed at least 61 different effect-size 
statistics (Elmore, 2001). Due to the large number, selection and appropriate 
interpretation of effect-size statistics is problematic. Few software programs contain 
automatic methods for their determination The purpose of this paper is to provide an 
overview of formulas for computing corrected and uncorrected effect-size statistics, and 
review suggested guidelines for their uses in data analysis and reporting for univariate 
and multivariate studies. 
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There are two major classes of effect sizes and a third, “miscellaneous” category 
described by Kirk (1996)): (a) variance-accounted-for effect sizes, and b) standardized 
mean differences effect sizes. Variance-accounted-for effect sizes (VAFE) can always be 
computed since all parametric analyses are correlational (Knapp, 1978; Thompson, 1984, 
1 991). Effect sizes in this class include indices such as r 2 , R 2 , and r| 2 . A VAFE size is the 
ratio of explained variance to total variance. For example, it can be obtained by dividing 
the sum-of-squares for an effect by the sum-of-squares total 

In ANOVA, the resulting effect size is called eta squared (t) 2 ). In multiple 
regression, the resulting effect size is called the squared multiple correlation (R 2 ). The 
formula in either case is 

R TJ SOSpyplained 
SOS-Total 

Variance-accounted for effect sizes can range from 0 to 1. Hence the amount of 
variance accounted for by the independent variable, namely SOSnxpiained, can explain a 
range of variance from none to the total exhibited. Hence, effect size represents the 
percentage of the total variance explained by the independent variable. 

The other class is the standardized difference effect size, representing the mean 
differences in units of common population standard deviations. Standardized difference 
effect sizes vary in how they can be used to estimate the standard deviation for the 
population. Effect sizes in this class include indices such as Cohen’s d, Glass’A, and 
Hedges’ g. 
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Cohen’s d is the most common example of a standardized effect-size statistic. It 
uses all the variance across the groups (SD pooi ed) because it is based on a larger N. The 
formula is 



d fMexnerimental ~ Mcnnfrnl) 

SDpooied 



Another example of standardized difference effect size is Glass’s A, which uses 
the SD of only the control group as an estimate of the SD of the population. This statistic 
expresses effect in standard deviation units and can be positive or negative and not 
bounded by 1 or 0 as the variance-accounted for effect size. It is exactly equivalent to a 
Z-score of the standard Normal distribution. Hence, it can be converted into statements 
about overlap between the two samples in terms of a comparison of percentiles. 

The existence of two different metrics with different ranges of values complicates 
interpretation of effect sizes. However, effect sizes in these two classes can be 
transformed into metrics of the other. For example, Cohen’s d can be converted to an r 



(Cohen, 1988): 

r= d/[(d 2 + 4) 5 ] 

When total size is small or group sizes are disparate, the following formula can be used 
(Aaron, Kromrey & Ferron, 1998): 

r — d/[(^ + [(N 2 -2N)/(n,n 2 4 ) 5 ] 

Also, an r can be converted to a d (Friedman, 1968): 

d= [2r]/[(l -r 2 ) 5 ]. 

Interpretation of the magnitude was recommended by Cohen (1988), who 
cautiously characterized effects as "small," "medium," and "large" for i/ and r 2 . A small 
effect size (d = .2) is less than a medium effect (d = .5) and this is less than a large effect 
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(d = .8). Cohen interprets a medium effect as one that is visible to the naked eye of a 
careful observer. A small effect size, although noticeable, is not so small as to be trivial. 
' Table 1 shows a summary of Cohen’s interpretations. 

Table 1. Interpretation of Effect Size 



Characterization 


d 




“low” 


0.2 


1.0% 


“medium” 


0.5 


5.9% 


“large” 


0.8 


13.8% 



Although these standards are commonly used when reporting effect sizes, Huck 
(2000) suggests establishing standards based on the raw units of the instrument used as a 
dependent variable. For example, if the task is the completion of a math drill, the 
researcher might determine that a nontrivial effect should consist of a ten-second drop in 
time between the experimental and control group. This “standard” should be based on 
the standard deviation of the population from which the inference is drawn. If one 
standard deviation for this math test is 20 seconds, it might be argued that half a standard 
deviation difference (10 seconds) is visible to the naked eye and could be regarded as a 
medium effect. The appraisal of effect sizes inherently requires the researcher to 
introduce personal value judgments about the practical or clinical importance of effects. 
As Baugh and Thompson (2001) stress, even small effect estimates may be important 
when the outcomes are critical, such as in life-or-death matters. 
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In addition to standardized difference and VAFE sizes, there are '■‘uncorrected” 
and “corrected” effect sizes. The theory of “ordinary least squares” used in “classical 
statistical methods, such as ANOVA and regression, tend to capitalize on all the variance 
present in the observed sample scores. This variance includes the “sampling error 
variance” that is unique to the sample under study. Hence, the VAFE sizes, such as t| 
and R 2 , which use the variance, tend to overestimate the effects that would be replicated 
in the population or in future samples. 

The extent of overestimation or positive bias in the sample VAFE size estimate 
can be corrected. The corrected effect size is obtained by removing the estimated 
sampling error variance. Corrected estimates are always less than or equal to uncorrected 
estimates. The corrected VAFE sizes include indices such as adjusted if 2 , Hays’s co 2 , and 
Herzberg’s if 2 . For standardized mean difference effect size, a corrected effect size is 
Thompson’s “corrected”*/. There is more sampling error variance when (a) sample sizes 
are smaller, (b) the number of observed variables is larger, or (c) the population effect is 
smaller. Hence, it is better to use corrected effect-size statistics if any one of the 
following is true: 

® F, t, or R 2 values are just above the critical level for statistical significance 

® N is small 

q An initial calculation of an uncorrected effect-size statistics suggests that the 
effect size is small 

Snyder and Lawson (1993) also suggest use of corrected effect sizes when the ratio of 
participants to dependent variables is less than 5:1. 
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Examples of imcorrected effect-size measures are p 2 , R 2 , Cohen’s d, and Glass’s 
A. Some corrected effect measures are adjusted R 2 , Hays’s to 2 , s 2 , and the Ezekiel 
formula (Thompson, 2002). Selecting the appropriate effect-size measure among so 
many options is complex, not only because of the range of available choices, but also 
because there is a lack of common agreement in the field (Thompson, 1999; Snyder & 
Thompson, 1998; Vacha-Haase, Nilsson, Reetz, Lance, & Thompson, 2000). Journal 
editors apparently welcome any choice of statistics that can be substantiated with reason. 
Although present circumstances are inconclusive, selection of an appropriate statistic can 
be made by determining that it is in concordance with the statistical analyses of the data. 

The choice of effect size measure should depend primarily upon the researcher’s 
intention to generalize results to other samples or to the population. If a researcher wants 
to use results from a previous sample to generalize to future samples, then examples of 
effect-size measures to use are r| 2 , partial r| 2 , Herzberg and Lord formulas. Examples of 
effect-size measures designed for developing population expectations are adjusted R 2 , 
Hays’s as 2 , and the Interclass correlation pi.. 

Although ANQVA can be considered a special case of regression analysis 
(Cohen, 1968), different statistics are used with each analysis. Effect-size measures used 
in ANOVA are rj 2 , partial T]\ a 2 , Hays’s « 2 , and Cohen’s d. For regression analysis some 
effect-size measures are R 2 , adjusted R 2 , s 2 , and the Ezekiel formula.. 

Fixed designs and random-effect design have also different effect-size measures 
associated with them. Fixed models assume that levels in factors are fixed in an ANOVA 
design or the values of the predictor factors are fixed in a regression mode. That is, either 
all the levels of independent variables are used or the researcher wants to generalize to 
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'the levels actually used in the study. A replication would need to use the same levels. In 
a random effect design, the researcher randomly selects the levels of the independent 
values to be used. Generalizations can be made to other levels, and replication studies 
could use other randomly selected levels. While the Ezekiel formula and e 2 .are 
exclusively used for fixed designs, the Herzberg formula and Hays’s to 2 have alternative 
formulas for fixed or random effects. 

A univariate design examines the relationship between one or more independent 
variables and a single dependent variable. A multivariate design examines multiple 
dependent variables. Canonical correlations and multivariate analysis of variance 
(MANOVA) are examples of multivariate techniques. Different effect-size statistics are 
used for univariate and multivariate analyses. Effect-size statistics used for univariate 
analyses are q 2 , partial q 2 , e 2 , Hays’s go 2 , R 2 , Ezekiel formula, and Cohen’s d. For 
multivariate analyses the effect-size statistics to use are D 2 and 1- X (Stevens, 1992). 
Table 2 contains formulas for common effect-size statistics. 

Conclusions 

The incorrect interpretation of statistical significance has stimulated a movement 
to report results that include effect size for significant and nonsignificant results. It is 
assumed that use of effect size can avoid interpretations that may be erroneously applied 
to the general population. In other words, reports of a significant difference should be 
clarified with the size of the difference. This review has provided a survey of various 
methods that have been recommended. 

There is no common agreement about the statistical methods and disagreement 
about the interpretations of various effect-size statistics that may be used. Whenever 
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possible, there should be an objective method of determination. While overstating the 
significance of research results can be ameliorated with effect-size reports, there needs to 
be further research and clear strategies for' effect-size reporting, perhaps by disciplines, 
and especially in fields and research topics where interpretation of effect size relies upon 
subjective interpretation. In such cases, the researcher should provide a clear rationale for 
the approach. 

Table 2. Formulas of Common Effect Size Statistics. 



Cohen’s d 


(Mexperimental Mcontrol) / SDpooIed 


Glass’s A 


(^experimental Mcontrol) / SDcontrol 


Hedges’s g 


(Mexperimental Mcontrol) / SDpooIed 


Eta squared, rj 2 


SSexffect / SStotal 


7 T~ 

Partial eta squared, q p 


S S ex fleet / (SSexffect SSexrror) 


7 

Epsilon squared, e 


(SSexfifect “ (d^fTect)(MSexrror)) / SStotal 


Omega squared fixed, © 2 


(SSexffect (d^ffect)(MSexrror)) / (MSerror SStotal) 


Omega squared random. 


(MSexffect MSexrror) / (*MS effect (^Sotal)(MS exrr0 r)) 


Interclass correlation, pi 


(MSexffect MSerror)/ (MS ex ffect ({^ffect)(MS e xrror) 


R 2 


SSexffect / SStotal 


adjusted R 2 


R 2 - ((1 - R 2 ) * (k / (n- k - 1))) 


Herzberg fixed 


1 ~((n- 1) / (n-k- 1» * (1 -R 2 ) 


Herzberg random 


1 — ((n — 1)/ (n — k — 1)) * ((n - 2)/ (n - k - 2)) * ((n + l)/n) * (1 - R 2 ) 


Ezekiel 


1 — ((n — 1 ) / (n-k- 1» * (1 -R 2 ) 


Lord 


1 -(1 -R 2 ) * ((n + k+ l)/(n-k- 1)) _ ~ 


Mahalanobis D 2 


4F((N-2)/N)*(df,/df 2 ) 
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